Overview

Dataset statistics

Number of variables8
Number of observations24639620
Missing cells355922
Missing cells (%)0.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.5 GiB
Average record size in memory64.0 B

Variable types

CAT5
NUM2
DATE1

Warnings

Componente is highly correlated with VariableHigh correlation
Variable is highly correlated with ComponenteHigh correlation
Tipo is highly correlated with PonderaciónHigh correlation
Ponderación is highly correlated with TipoHigh correlation
Valor is highly skewed (γ1 = -519.2066934) Skewed
df_index has unique values Unique

Reproduction

Analysis started2020-11-02 01:49:56.045692
Analysis finished2020-11-02 01:57:06.984076
Duration7 minutes and 10.94 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct24639620
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24635388.33
Minimum0
Maximum49279236
Zeros1
Zeros (%)< 0.1%
Memory size188.0 MiB
2020-11-02T01:57:19.381721image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2460560.95
Q112315292
median24634342.5
Q336954650.25
95-th percentile46814348.05
Maximum49279236
Range49279236
Interquartile range (IQR)24639358.25

Descriptive statistics

Standard deviation14226109.26
Coefficient of variation (CV)0.5774664101
Kurtosis-1.199992716
Mean24635388.33
Median Absolute Deviation (MAD)12319661.5
Skewness0.0002424848105
Sum6.07006607e+14
Variance2.023821846e+14
MonotocityNot monotonic
2020-11-02T01:57:19.506774image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
251699181< 0.1%
 
486793091< 0.1%
 
276975361< 0.1%
 
318938891< 0.1%
 
193130261< 0.1%
 
18114561< 0.1%
 
486629171< 0.1%
 
25276221< 0.1%
 
337744811< 0.1%
 
235257711< 0.1%
 
Other values (24639610)24639610> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
31< 0.1%
 
51< 0.1%
 
71< 0.1%
 
ValueCountFrequency (%) 
492792361< 0.1%
 
492792331< 0.1%
 
492792321< 0.1%
 
492792311< 0.1%
 
492792291< 0.1%
 

Estación
Categorical

Distinct30
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size188.0 MiB
CAI Venecia
 
1495739
CAI 20 de Julio
 
1474771
Estación Monitoreo Ruido Inteligente 7
 
1410095
Edificio Marly
 
1159797
CAI Villa Nidia
 
1127938
Other values (25)
17971280 
ValueCountFrequency (%) 
CAI Venecia14957396.1%
 
CAI 20 de Julio14747716.0%
 
Estación Monitoreo Ruido Inteligente 714100955.7%
 
Edificio Marly11597974.7%
 
CAI Villa Nidia11279384.6%
 
CAI Claret11058814.5%
 
Santa Cecilia11052964.5%
 
CAI Alamos11037314.5%
 
CAI Normandia10882634.4%
 
CAI Rincon10857604.4%
 
Other values (20)1248234950.7%
 
2020-11-02T01:57:19.641193image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-02T01:57:19.757544image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length39
Median length12
Mean length14.26484702
Min length3

Variable
Categorical

HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size188.0 MiB
Leq
4908126 
Lmin
3957737 
L10
3957165 
Lmax
3956875 
L90
3956054 
Other values (7)
3903663 
ValueCountFrequency (%) 
Leq490812619.9%
 
Lmin395773716.1%
 
L10395716516.1%
 
Lmax395687516.1%
 
L90395605416.1%
 
L50372570215.1%
 
Velocidad del Viento298110.1%
 
Temperatura Ambiente297420.1%
 
Dirección del Viento297390.1%
 
Humedad Relativa296960.1%
 
Other values (2)589730.2%
 
2020-11-02T01:57:19.858994image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-02T01:57:19.952707image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length20
Median length3
Mean length3.429623833
Min length3

Componente
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size188.0 MiB
Ruido
24461659 
Meteorología
 
177961
ValueCountFrequency (%) 
Ruido2446165999.3%
 
Meteorología1779610.7%
 
2020-11-02T01:57:20.045419image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-02T01:57:20.126638image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-02T01:57:20.202703image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length12
Median length5
Mean length5.050557882
Min length5

Fecha
Date

Distinct98402
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size188.0 MiB
Minimum2017-06-30 08:00:00
Maximum2020-08-07 23:00:58
2020-11-02T01:57:23.749357image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-02T01:57:23.870390image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Valor
Real number (ℝ)

SKEWED

Distinct1267260
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.85356688
Minimum-180819
Maximum15971.49995
Zeros67366
Zeros (%)0.3%
Memory size188.0 MiB
2020-11-02T01:57:24.454321image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-180819
5-th percentile15.8
Q141.2
median53.3
Q363.97430023
95-th percentile82.40000153
Maximum15971.49995
Range196790.5
Interquartile range (IQR)22.77430023

Descriptive statistics

Standard deviation326.0765831
Coefficient of variation (CV)6.540687127
Kurtosis284395.7524
Mean49.85356688
Median Absolute Deviation (MAD)11.3
Skewness-519.2066934
Sum1228372943
Variance106325.938
MonotocityNot monotonic
2020-11-02T01:57:24.559672image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0673660.3%
 
-999597770.2%
 
57584740.2%
 
56584400.2%
 
56.5584380.2%
 
55.5582210.2%
 
59.5578210.2%
 
55577320.2%
 
54.5576640.2%
 
56.3574900.2%
 
Other values (1267250)2404819797.6%
 
ValueCountFrequency (%) 
-18081959< 0.1%
 
-17982012< 0.1%
 
-1738261< 0.1%
 
-1668331< 0.1%
 
-1588411< 0.1%
 
ValueCountFrequency (%) 
15971.499951< 0.1%
 
14073.400021< 0.1%
 
11243.800031< 0.1%
 
10497.200031< 0.1%
 
10403.600031< 0.1%
 

Ponderación
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing177961
Missing (%)0.7%
Memory size188.0 MiB
Lin
21800425 
A
 
1920324
C
 
740910
ValueCountFrequency (%) 
Lin2180042588.5%
 
A19203247.8%
 
C7409103.0%
 
(Missing)1779610.7%
 
2020-11-02T01:57:24.664961image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-02T01:57:24.720627image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-02T01:57:24.787386image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length3
Median length3
Mean length2.783987415
Min length1

Tipo
Categorical

HIGH CORRELATION

Distinct40
Distinct (%)< 0.1%
Missing177961
Missing (%)0.7%
Memory size188.0 MiB
Leq
 
874096
Impulso
 
867638
Pico
 
740946
1/3 Oct 10kHz
 
678502
1/3 Oct 100Hz
 
678485
Other values (35)
20621992 
ValueCountFrequency (%) 
Leq8740963.5%
 
Impulso8676383.5%
 
Pico7409463.0%
 
1/3 Oct 10kHz6785022.8%
 
1/3 Oct 100Hz6784852.8%
 
1/3 Oct 25Hz6783112.8%
 
1/3 Oct 6.3kHz6782822.8%
 
1/3 Oct 315Hz6782552.8%
 
1/3 Oct 20kHz6781592.8%
 
1/3 Oct 12.5kHz6781262.8%
 
Other values (30)1723085969.9%
 
2020-11-02T01:57:24.887581image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-02T01:57:24.982102image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length15
Median length13
Mean length11.96864176
Min length3

Interactions

2020-11-02T01:55:22.737208image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-02T01:55:26.493987image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-02T01:55:30.541413image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-02T01:55:34.207045image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-11-02T01:57:25.052419image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-11-02T01:57:25.135778image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-11-02T01:57:25.227485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-11-02T01:57:25.320335image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-11-02T01:57:25.432538image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-11-02T01:55:56.646191image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-02T01:56:08.710337image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-02T01:56:44.154075image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-02T01:56:49.084277image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

df_indexEstaciónVariableComponenteFechaValorPonderaciónTipo
042292150CAI QuiriguaLeqRuido2020-02-25 05:01:0351.70Lin1/3 Oct 4kHz
142388519CAI QuiriguaLeqRuido2020-03-15 03:00:5856.20Lin1/3 Oct 250Hz
243466573CAI RinconL50Ruido2019-07-20 16:00:0061.70Lin1/3 Oct 125Hz
315421732Edificio MarlyL90Ruido2019-07-26 02:00:0071.30CPico
429309660CAI AmericasLmaxRuido2019-08-16 19:00:0078.20Lin1/3 Oct 1.25kHz
543249386CAI RinconL90Ruido2019-06-06 03:00:0034.50Lin1/3 Oct 250Hz
640083635CAI NormandiaL90Ruido2019-12-18 10:00:3164.80Lin1/3 Oct 40Hz
740720762CAI NormandiaL50Ruido2020-05-01 21:00:4652.70Lin1/3 Oct 400Hz
88213734CAI TejarL90Ruido2020-06-04 20:00:3155.20Lin1/3 Oct 40Hz
97908828CAI TejarL10Ruido2020-04-04 11:00:3161.10Lin1/3 Oct 200Hz

Last rows

df_indexEstaciónVariableComponenteFechaValorPonderaciónTipo
2463961023804100Santa CeciliaL10Ruido2019-10-18 22:00:0057.60Lin1/3 Oct 25Hz
2463961120723228Hotel MorrisonLminRuido2020-05-04 23:00:3120.70Lin1/3 Oct 3.15kHz
2463961237100377CAI Las FeriasL10Ruido2020-03-08 06:00:5861.20Lin1/3 Oct 1.25kHz
246396131963843Estación Monitoreo Ruido Inteligente 7LeqRuido2020-05-15 12:49:1751.95Lin1/3 Oct 6.3Hz
2463961410566042CAI VeneciaL90Ruido2020-01-30 02:00:5827.90Lin1/3 Oct 2kHz
24639615316098Estación Monitoreo Ruido Inteligente 13LeqRuido2020-03-22 19:22:1714.97Lin1/3 Oct 10kHz
2463961646589276CAI San VictorinoL90Ruido2019-06-08 08:00:0067.10ALeq
246396179580084CAI VeneciaL10Ruido2019-09-07 07:00:0066.60Lin1/3 Oct 200Hz
246396184711404CAI 20 de JulioLminRuido2019-08-13 00:00:0032.20Lin1/3 Oct 1.25kHz
2463961913130449CAI Villa NidiaLeqRuido2019-06-28 04:00:0054.80Lin1/3 Oct 315Hz